EVOC 20 PolySynth (U/V) detection parameters

Human speech consists of a series of voiced sounds—tonal sounds or formants—and unvoiced (U/V) sounds. The main distinction between voiced and unvoiced sounds is that voiced sounds are produced by an oscillation of the vocal cords, whereas unvoiced sounds are produced by blocking and restricting the air flow with lips, tongue, palate, throat, and larynx.

If speech containing voiced and unvoiced sounds is used as a vocoder analysis signal but the synthesis engine doesn’t differentiate between voiced and unvoiced sounds, the result sounds rather weak. To avoid this problem, the synthesis section of the vocoder must produce different sounds for the voiced and unvoiced parts of the signal.

The EVOC 20 PolySynth includes an Unvoiced/Voiced detector for this specific purpose. This unit detects the unvoiced portions of the sound in the analysis signal and then substitutes the corresponding portions in the synthesis signal with noise, with a mixture of noise and synthesizer signal, or with the original signal. If the U/V detector detects voiced parts, it passes this information to the Synthesis section, which uses the normal synthesis signal for these portions.

A formant is a peak in the frequency spectrum of a sound. In the context of human voices, formants are the key component that enables humans to distinguish between different vowel sounds—based purely on the frequency of the sounds. Formants in human speech and singing are produced by the vocal tract, with most vowel sounds containing four or more formants.

Figure. U/V Detection parameters.

U/V detection parameters